Towards a Low Cost Etl System
نویسندگان
چکیده
Data Warehouses store integrated and consistent data in a subject-oriented data repository dedicated especially to support business intelligence processes. However, keeping these repositories updated usually involves complex and time-consuming processes, commonly denominated as Extract-Transform-Load tasks. These data intensive tasks normally execute in a limited time window and their computational requirements tend to grow in time as more data is dealt with. Therefore, we believe that a grid environment could suit rather well as support for the backbone of the technical infrastructure with the clear financial advantage of using already acquired desktop computers normally present in the organization. This article proposes a different approach to deal with the distribution of ETL processes in a grid environment, taking into account not only the processing performance of its nodes but also the existing bandwidth to estimate the grid availability in a near future and therefore optimize workflow distribution.
منابع مشابه
Co-evolution Model for Data Sources and Views
ETL process evolution is investigated below. A model-driven approach to templates and ETL process evolution problem is developed. We suppose that the ETL process evolution problem is mainly a problem of a low abstraction level. So the definition of ETL process based on a conceptual model is a principal step towards effective ETL evolution. Our approach seems to be scalable, robust and simpler i...
متن کاملLogical Optimization of ETL Workflows
Extraction-Transformation-Loading (ETL) tools are pieces of software responsible for the extraction of data from several sources, their cleansing, customization and insertion into a data warehouse. Usually, these processes must be completed in a certain time window; thus, it is necessary to optimize their execution time. In this paper, we delve into the logical optimization of ETL processes, mo...
متن کاملDetermining Essential Statistics for Cost Based Optimization of an ETL Workflow
Many of the ETL products in the market today provide tools for design of ETL workflows, with very little or no support for optimization of such workflows. Optimization of ETL workflows pose several new challenges compared to traditional query optimization in database systems. There have been many attempts both in the industry and the research community to support cost-based optimization techniq...
متن کاملبهبود فرآیند استخراج، تبدیل و بارگذاری در پایگاه داده تحلیلی با کمک پردازش موازی
Abstract Data Warehouses are used to store data in a structure that facilitates data analysis. The process of Extracting, Transforming, and Loading (ETL) covers the process of retrieving required data from the source system and loading them to the data warehouse. Although the structure of source data (e.g. ER model) and DW (e.g. star schema) are usually specified, there is a clear lack of a ...
متن کاملA Generic Procedure for Integration Testing of ETL Procedures
Testing is one of the key factors to any software products’ success and data warehouse systems are no exception. Data warehouse can be tested in different ways (e.g. front-end testing, database testing) but testing the data warehouse’s ETL procedures (sometimes called back-end testing [1]) is probably the most complex and critical data warehouse testing job, because it directly affects the qual...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2014